Parallel, linear-scaling building-block and embedding method based on localized 

orbitals and orbital-specific basis sets. 
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We present a new linear scaling method for the energy minimization step of semiempirical and first- 
principles Hartree-Fock and Kohn-Sham calculations. It is based on the self-consistent calculation 
of the optimum localized orbitals of any localization method of choice and on the use of orbital- 
specific basis sets. The full set of localized orbitals of a large molecule is seen as an orbital mosaic 
where each tessera is made of only a few of them. The orbital tesserae are computed out of a set 
of embedded cluster pseudoeigenvalue coupled equations which are solved in a building-block self- 
consistent fashion. In each iteration, the embedded cluster equations are solved independently of 
each other and, as a result, the method is parallel at a high level of the calculation. In addition to full 
system calculations, the method enables to perform simpler, much less demanding embedded cluster 
calculations, where only a fraction of the localized molecular orbitals are variational while the rest are 
frozen, taking advantage of the transferability of the localized orbitals of a given localization method 
between similar molecules. Monitoring single point energy calculations of large poly(ethylene oxide) 
molecules and three dimensional carbon monoxide clusters using an extended Hiickel Hamiltonian 
are presented. 
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I. INTRODUCTION 

The development of linear scaling computational meth- 
ods for electronic structure calculations in molecules and 
solids with a very large number of atoms, (i.e. methods 
whose computational demands grow as the first power 
of the size of the system,) has been a very active and 
successful field in the last decade. 1 ' 2 Linear scaling and 
low order scaling techniques exist for the computation 
of the one-electron effective Hamiltonian matrix in first- 
principles calculations (with density functional theory 
and wave function based methods), 3 ' 4,5 ' 6,7 ' 8 as well as 
for the energy minimization step, 1 - 2 ' 9 ' 10 ' 11 ' 12 - 13 ' 14 ' 15 ' 16 ' 17 
which is a common step to first-principles and semiem- 
pirical calculations. In low order scaling energy mini- 
mization methods, the traditional cubic scaling diago- 
nalization of the matrix representation of the effective 
one-electron Hamiltonian in a finite basis set, 18 which 
leads to the canonical orbitals, is substituted by differ- 
ent algorithms which, either solve directly for the unique 
optimal density matrix, 9 ' 10,11 ' 16,17 or for some sort of ar- 
bitrary optimal localized orbitals. 5 ' 12 ' 14 ' 15 In parallel to 
linear scaling electronic structure methods, a significant 
development has also been made in embedding methods, 
which focus the computational effort on local properties 
of a system 19 ' 20 (see Refs. 21 and 22 for recent reviews). 

In this paper, we present a new method for the en- 
ergy minimization step which can be used in semiem- 
pirical and first-principles Hartree-Fock and Kohn-Sham 
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calculations. It is expected to be useful as a comple- 
ment of linear-scaling methods for the computation of 
the Hamiltonian matrix, which are nowadays available 
for local exchange-correlation potentials and Coulomb 
potentials, 3,4 ' 5 ' 6 ' 7 as well as for exact exchange fields. 8 
The energy minimization method is based on exploit- 
ing the self-consistent calculation of the optimum local- 
ized orbitals of any localization method of choice and 
the use of orbital-specific basis sets. The n occupied 
localized orbitals of a very large molecule which corre- 
spond to a chosen localization method (e.g. the popular 
methods of Boys, 23 Edmiston-Ruedenberg, 24 and Pipek- 
Mezey, 25 or any other available or newly developed lo- 
calization method) can be regarded as a mosaic of or- 
bitals, where each of its component tesserae contains only 
a few orbitals. In the present method we define an orbital 
tessera as a subset of the occupied localized orbitals (of 
the method of choice) which are localized in some region 
of real space and compute the localized orbitals of each 
tessera out of one specific pseudoeigenvalue equation to 
be solved in a basis set expansion approximation, using 
a basis set specific to the tessera, or orbital-specific basis 
set. In other words, a set of building-block embedded 
cluster pseudoeigenvalue coupled equations, one for each 
tessera, is solved in a self-consistent manner. Doing so, 
the method becomes parallel (the tesserae are computed 
independently of each other in the self-consistent proce- 
dure) and exhibits a linear-scaling dependence with the 
size of the molecule. We call the present method Mosaico. 

Early works on localized orbitals proposed the ideas 
of computing them directly by a self-consistent proce- 
dure 24,26 and computing one or several localized orbitals 
out of separate eigenvalue equations starting with a set of 
qualitatively localized orbitals. 27 These ideas have been 
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used by several authors to propose methods leading to 
some particular sets of localized orbitals or to localized 
orbitals dependent on the initial guess. 12 ' 28,29 ' 30,31 We 
apply them to the computation of the occupied localized 
orbitals of any localization method of choice. Here, we 
do not pay attention to the computation of virtual lo- 
calized orbitals; methods for the determination of virtual 
localized orbitals useful in wave function based correla- 
tion methods have been used already in early works 28 . 
The idea of using different basis sets for different regions 
of the system is also present in early work, 27 it has been 
used and discussed by several authors, 5,12 ' 31,32 and is a 
common procedure in embedded cluster and effective core 
potential calculations. 20,21 

Besides its use as a linear scaling method in large 
molecules and solids, the Mosaico method can be used 
as an embedded cluster method, where only a few local- 
ized orbitals of a large system are treated variationally 
while the rest is taken from a calculation on a similar 
molecule and frozen, with obvious computational advan- 
tages. This can be done because the solutions of the 
building-block embedded cluster coupled equations arc 
the localized orbitals corresponding to a given localiza- 
tion method of choice, which enables their transferability 
between similar molecules. 

In Sec. II we present the details of the Mosaico 
method. We performed monitoring calculations on large 
poly(ethylen oxide) molecules and three dimensional car- 
bon monoxide clusters using an extended Hiickcl scmiem- 
pirical Hamiltonian, which are presented in Sec. III. 
They are aimed at showing the convergence of the par- 
allel calculation to the right solution, the convergence of 
the total energy with the size of the orbital-specific basis 
sets towards the exact value, the linear-scaling character- 
istics of the method, and the performance of embedded 
cluster approximations. 



II. METHOD 

A. Basics of the method 

Let us consider the Hartree-Fock equations in wave 
function based ab initio methods, 18,33,34 the Kohn-Sham 
equations in density functional theory, 35 or the effective 
Hartree-Fock equations in semiempirical methods. 36 The 
canonical form of the spin-restricted closed-shell version 
of these equations can be written as 



Fip 



can ,„can _ 

- <P £ ■ 



(1) 



with an appropriate choice of the one-electron Hamilto- 
nian F for each case, where ip can is a row vector of n 
occupied molecular orbitals, 



,.can\ I , A can\ | ,_can\\ 



(2) 



and e is an n x n diagonal matrix of orbital energies. 
This and what follows can be generalized to the spin- 
unrestricted cases if the orbitals and the Hamiltonian in 



Eqs. 1 and 2 are adequately substituted by the a and (3 
choices; 37 here we will continue with the detailed descrip- 
tion of the spin-restricted closed-shell case for the sake of 
clarity. The virtual orbitals, f mr , are also solutions of 
Eq. 1; in this paper, however, we will focus our attention 
on the occupied spectrum and, unless specified, we will 
refer only to different sets of occupied orbitals from now 
on. 

The Fock-Dirac one-electron density operator is de- 
fined as 



p= ! p can <p ca ^ = ^2\v>r n )(v>i 



(3) 



It is invariant under arbitrary unitary transformations of 
the occupied orbitals, 



, „L can ttL 

<£ =<£_ canLL , 

can can\ L Lt 

p = tfi if = lf if , 



(4) 
(5) 



where we use the notation can U_ for the unitary matrix 
that transforms canonical occupied orbitals onto local- 
ized orbitals of a given localization method, which we 
label L. Also, p is the projection operator of the occu- 
pied space, 



p ip can = <p can . 



p<£ = lf , 
p<p vir = 0. 



(6) 



The fact that wave function, one-electron density, to- 
tal energy, and Fock operator are invariant under unitary 
transformations within the occupied space (Eq. 4) has 
been exploited to define localized orbitals, which are use- 
ful to facilitate large scale calculations and to bridge the 
gap between extensive numerical calculations and qual- 
itative chemical thinking. A localization method, say 
L, can be defined by its choice of C anLL L - Very sound 
and popular localization methods are the methods of 
Boys, 23 Edmiston-Ruedenberg, 24 and Pipek-Mezey, 25 al- 
though others have been proposed. All of the above can 
be used in first-principles methods; Pipck-Mezey's can 
also be applied in semiempirical methods. The common 
procedure to compute localized orbitals is to complete 
firstly a canonical calculation and, later, use the canoni- 
cal orbitals in an iterative optimization process converg- 
ing to canLL L ■ Gilbert 26 has pointed out that any set of 
occupied localized orbitals formally fulfills the eigenvalue 
equation of an effective Fock (or Kohn-Sham) Hamilto- 
nian defined as F L = F — pFp + pLp, 



F L (p L 



F-pFp+pLp 



A. 



(7) 



where L is a Hermitean localization operator and A is a 
diagonal matrix whose diagonal elements are the eigen- 
values of F L and of L, 



A = <f 



L t L ip L . 



(8) 
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Localization operators L corresponding to the above 
mentioned localization methods arc, in general, not 
known and Eq. 7 has not been exploited to compute 
the localized orbitals via diagonalization of the matrix 
pcanj L^pcan^ ^ Q our knowledge. 

Eq. 7 has been discussed by several authors 12,24,31 and 
it has been used as a basis for a building-block technique 
based on a self-consistent series of embedded cluster cal- 
culations. 12 The iterative solution of the building-block 
equations of Ref. 12, which contain arbitrary localization 
operators, leads to localized orbitals dependent on the 
initial guess and on the iteration procedure. Although in 
principle this is not detrimental for total energy calcu- 
lations, it is an undesirable property because it creates 
problems of transferability of localized orbitals between 
similar molecules. 

Here, on the basis of the same ideas than in Ref. 12, 
we present a new building-block and embedding method 
that, starting from an arbitrary guess and a choice 
of a particular localization method (Boys', Edmiston- 
Ruedenberg's, Pipck-Mezey's, or any other), leads, in 
a controlled manner, to the corresponding localized or- 
bitals. We will call this method Mosaico. 

Let us suppose we set the goal of computing the n oc- 
cupied localized orbitals corresponding to a localization 
method L for the ground state of a molecule, 



</* = (|^>,|^),...,|^}) 



(9) 



and related properties such as electron density and total 
energy. We see the whole set of localized orbitals as a mo- 
saic of orbitals, and define subsystem, fragment, cluster, 
or tessera as a subset of these orbitals which are local- 
ized in some region of real space. (For example, in an 
organic acid R-COOH we may define one of the subsys- 
tems or tesserae as that made of nine orbitals localized 
in the spatial region close to the COOH nuclei.) The 
terms subsystem, fragment, and cluster have been used 
by many authors with different meanings and we prefer 
to use the term tessera all over the paper for the present 
definition of subsystem. In this way, the whole mosaic 
of localized orbitals is made of N tesserae A, B, . . . , and 
the vector of localized orbitals can be rewritten as 



(10) 



where p L A is a row vector with the ua occupied orbitals 
of tessera A, 



& = (\?Al) A A<fA nA )) > 

a row vector with the rig orbitals of tessera B, 



(11) 



V L B = (\VBl)AVB2),---AVBn B )) , (12) 

and so on, with ua + n B + ■ ■ ■ = n. The orbitals of a 
tessera define a subspace of the occupied space whose 
density and projection operator is 



The density operators of the tesserae fulfil 
PaPb = SabPa , 

N 

P=^2pb, 

B=l 

pp A = PAP = PA , 



(14) 
(15) 
(16) 



where the sum in Eq. 15 extends over all the N tesserae 
of the system. 

In the Mosaico method we seek to compute the lo- 
calized orbitals of each tessera out of its own eigenvalue 
equation: 



tessera A 
tessera B 

P B^B 



F - pF p + pL A p 
F - pF p + pL B p 



1-A 1-A 



<^A A ,(17) 



^A B (18) 



(X A i As j • • ■ > being diagonal matrices of size »a x ha, 
n B x n Bl . . . ,) under the conditions of Eqs. 13-16. These 
conditions are fulfilled if all the orbitals are eigenfunc- 
tions of the same Hcrmitean operator. In other words, 
all the orbitals ip 1 ^ for B ^ A must be eigenfunctions of 

F%, all the orbitals <p L for A ^ B must be eigenfunc- 
tions of F B , and so on. This means that the subsystem 
localization operators La, L b , ... of Eqs. 17, 18, ... 
must be such that the effective subsystem Hartree-Fock 
or Kohn-Sham Hamiltonians F](, F b , . . . , have the same 
eigenfunctions as F L in Eq. 7 but different eigenvalues. 
(Note that the virtual orbitals ip mr are also eigenfunc- 
tions of Eqs. 17, 18, . . . since pip mr = 0; their calculation 
has not been stated as a goal here and they will not be 
referred to in the rest of the Section.) 

As commented above, the localization operator L in 
Eq. 7 corresponding to a localization method L is usu- 
ally not known. However, every localization method L 
has a well defined procedure for the computation of the 
unitary transformation matrix C anlL L (Eq. 4) in a given 
molecule. 23,24,25 This procedure can also be applied to 
any orthogonal basis of the occupied space ip^ other 
than the canonical orbital basis ip can - j the result is then 

the unitary matrix qU_ l that transforms the initial non- 
canonical set of occupied orbitals onto the L-method lo- 
calized orbitals, 



(19) 



At this point, we can express the operator L of Eq. 7 as 
its spectral representation in any basis of the occupied 



space, e.g. ip 



(o) 



' \<p L * = <p<® oU L A oC/ L V 0)t - 



(20) 



(13) 



Now, consistently with the discussion following 
Eqs. 17, 18, . . . , we can define the following Hermitean 
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subsystem or tessera localization operators: 

La = V L AA(n)V Lt =V (0) oU L X A(n) ot/ L V (0)t (^l) 
Lb = ^ L X B(n) ^ =^ Q \U L A B(jl) oC/ L V 0)t (22) 



where A^^) , X.B(n) > ■ ■ ■ are diagonal matrices of size nxn 
with arbitrary diagonal real data. A choice consisting on 
low values for the n A diagonal elements of X^n) corre- 
sponding to the localized orbitals of tessera A, ip L A , and 
sufficiently higher values for the n — n A remaining diag- 
onal elements, guarantees that the localized orbitals of 
tessera A are computed as the lowest ua eigenvectors 
of the effective Hartree-Fock or Kohn-Sham Hamiltonian 
F A (Eq. 17), which is a convenient choice for safe and ef- 
ficient orbital selection in the iterations of self-consistent 
procedures. Obviously, the same comments stand for all 
tesserae. We may remark that a choice consisting on neg- 
ative values for the ha, n B , ■ ■ ■ cited diagonal elements 
and zero values for the n — ua, n — n B , ■ ■ ■ remaining el- 
ements, although not necessary, is efficient and simplifies 
the expression of the subsystem localization operators, 

^ = ^A^-^\u L{A) ^Aou L{A) ^ m m) 

Lb = ^ B \ B ^=^\U L ^X BQ U L ^^A) 



where X A , of size ua x tia, is the diagonal eigenvalue 
matrix of Eq. 17, and qU^^ is a rectangular matrix 
made of ua columns of qU_ l , and similarly for all tesserae. 

Using Eqs. 21, 22, . . . in 17, 18, ... , plus the fact that 
p is the projection operator of the occupied space (Eq. 6), 
results in the working equations of the Mosaico method 
for a localization method of choice L: 



tessera A 
FaV 



L * L A 



F-pFp + ^\u L X A(n) U L W 



(o)t 



tessera B 



F-pFp + ^\U L X B(n) oC/ L y 0)t 



(25) 



(26) 



Eq. 25 is the Mosaico equation for the embedded tessera 
A, Eq. 26 for the embedded tessera B, and so on. 
They are pseudoeigenvalue equations (even in the case 
of semiempirical methods with density-independent F 
Hamiltonians.) They can be solved with standard SCF 
iterative procedures. Starting with an initial guess, <p(°\ 
the procedure of the localization method of choice, L, is 
applied in order to compute qU_ l , which, together with p 
and F, give the embedded tessera Hamiltonians F A , F B , 
. . . for the current iteration. (Note that the elements of 



the diagonal matrices A 



A(n)i A B („), 



, are input real 



numbers.) Solving the eigenvalue equations leads to new 



orbitals which are used to update ip^ and iterate. At 
convergence, (fi^ = ip L and oU_ L is the unit matrix. Also, 
the eigenvalues of the tesserae are identical to the input 
values of X A / n \, X B r n y .... For instance, the ua non-zero 
values of X A coincide with the non-zero values of X A ^ 
associated with the localized orbitals of tessera A. 

Several options are open for iterative procedures lead- 
ing to the solutions of the Mosaico equations 25, 26, ... 
They all have to face two basic types of iterations: (1) 
The microiterations, or tessera iterations, which are stan- 
dard SCF iterations addressed to solve one of the embed- 
ded tessera equations, e.g. Eq. 25, for fixed orbitals of 
the other tesserae. All the methods available for speeding 
the solution of the canonical Hartree-Fock or Kohn-Sham 
equations can be used here. (2) The macroiterations, or 
mosaic iterations, which involve repeated solutions of all 
the embedded tessera equations. The macroiterations 
can be performed sequentially on a given list of tesserae, 
e.g. A, B, . . . , A, B, . . . , meaning that the orbitals com- 
puted for tessera A are used in the calculation of tessera 
B, and so on. Most interestingly, they can be performed 
in parallel, meaning that the calculations of all tesserae 
A, B, ... are done using the whole mosaic of orbitals 
from the previous macroiteration. This alternative is 
very important, because it allows to take full advantage 
of parallelism at a high level of the calculation. In other 
words, the full Mosaico calculation is made of a sequence 
of macroiterations, each of them consisting of a parallel 
set of Hartree-Fock, Kohn-Sham, or semiempirical cal- 
culations on the embedded tesserae A, B, .... Each of 
these tessera calculations, which can be time consuming, 
can be performed in a separate processor or computer. 

Alternataively, the macroiterations can be done a la 
carte, that is restricted to only one or several selected 
tesserae. This is to say that only some of the localized or- 
bitals are optimized whereas the rest of them are frozen. 
This is the embedded cluster approximation. Its reliabil- 
ity rests upon the transferability of the localized orbitals. 
Embedded cluster calculations are significantly cheaper 
than the full calculation of a complete system. They can 
be performed on a molecule or solid if a calculation on 
a similar molecule or solid has been carried out before 
in order to provide the orbitals of the embedding frozen 
tesserae. They are specially useful to study defects in 
solids, chemisorption, series of molecules with different 
substituents, or reactions taking place in local regions of 
large molecules. 

The Mosaico equations (Eqs. 25, 26, . . . ), which give 
the localized orbitals of a given localization method L 
and, in consequence, the same total energy and electron 
density than the canonical calculation, are the basis for 
approximations that make them useful in practice. These 
approximations, which resort to truncations supported 
by the localized nature of the orbitals, are systematic and 
converge to the exact result. They are basically twofold: 
(1) An orbital-specific basis set approximation can be 
adopted, where the localized orbitals of a tessera are ex- 
panded in a different basis set than the localized orbitals 
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of another tessera. This can be done because the tia 
occupied localized orbitals of tessera A are the only or- 
bitals to be computed by solving equation 25, whereas 
the n — tia orbitals of the other tesserae are discarded; 
consequently a local basis set can be used as long as it 
is sufficient for a good representation of the localized or- 
bitals ip L . Obviously this is true for any tessera. (2) A 
localization algorithm based on local rotations can be 
used in each interation instead of the standard algorithm 
of the localization method of choice, L. In the local ro- 
tations algorithm, the localized orbitals of tessera A are 
computed in each iteration using a subset of the ip^ or- 
bitals which is localized in tesserae not too distant from 
A. This can be done because the target localized orbitals 
are computed out of input localized orbitals rather than 
out of canonical orbitals. 



B. Use of orbital-specific basis sets, OSBS 

In a standard molecular calculation, all the orbitals of 
a molecule are expanded in a common basis set, which is 
the basis set of the molecule and consists of ubsf basis 
set functions, 

X = (I Xl>,| X2>,...,| Xn BSP )) , (27) 

which could be contracted Gaussian functions or other 
sort of local functions. In the orbital-specific basis set 
approximation, OSBS, the localized orbitals of tessera A 
are expanded in a local basis set made of bA functions, the 
localized orbitals of tessera B in a local basis set made 
of b B functions, and so on. Although not necessary, it 
is very convenient that the local basis sets are subsets 
of the global basis set, and that several tesserae share a 
number of basis set functions. The local basis sets can 
be represented by the following row vectors, 

tessera A : 

X A = (I XAl),\ XA2),---,\ XAb A )) , (28) 
tessera B : 

X B = (I Xbi),\ XB2>,...,| Xs6 b » . (29) 

Accordingly, eqs. 25, 26, . . . , take the usual matrix 
form, 18 

tessera A : F_ A Q.A = (2-A Aa (30) 

tessera B : F B C_ B =S B C_ B \ Bl (31) 

In Eq. 30, for instance, F A and S_ A are the b A x bA matrix 
representations of the F A and the unit operators in the 
X A basis set, 

F L a = x} a f aX a ^ S L a = x} a X a i (32) 



and C_ A is the bA x ha matrix of localized orbital coeffi- 
cients, 

!P L a = Xa£-A- ( 33 ) 

The solution of Eq. 30 can be attained using standard 
0(b\) diagonalization procedures. Except for low gap 
materials, where the degree of localization attainable is 
limited and the localized orbitals may decay slowly with 
distance, 38 the local basis set size bA is expected to re- 
main within reasonable limits for a diagonalization or, 
more generally, for an embedded cluster calculation. Ob- 
viously, the growth of bA will impose more practical lim- 
itations in 3D systems, like bulk solids and very big clus- 
ters, than in 2D and ID systems, like many molecules. 
In general, all the methods useful to speed up a stan- 
dard molecular calculation, like convergence acceleration 
methods, can be used with Eq. 30. But, in this case, 
additional advantage can be taken from the fact that 
only a small number of eigenvalucs/eigcnfunctions arc 
computed for each tessera, and efficient diagonalization 
algorithms used in CI calculations, like the multiroot 
Davidson-Liu method, 39 ' 40 can be applied to significantly 
reduce the prefactor of the 0(b A ) dependence. 

Let us comment on the calculation of the effective 
Hamiltonian matrix of a tessera, F_ A , where the local- 
ized nature of the orbitals is also profitable. For simplic- 
ity, we will call <fi the row vector of the n current local- 
ized orbitals {<p L , Eq. 19) at a given iteration, which can 
be written as the union of the row vectors (/> , <\> , . . . , 
that contain the tia, n B , ... current localized^ orbitals of 
tessera A, B, . . . respectively, 

The bA x bA effective Hamiltonian matrix of tessera A is 
F-A = X\ P AX A 

= Ka^2La ~ Ka^P^A + ^ A t^A(n)^ X a {^) 

The first term in the right hand side of Eq. 35 is a diag- 
onal block of the Hamiltonian matrix of the full system. 
The second term can be expanded in inter- tessera terms 
by inserting Eq. 15: 

X A PFPX A = 

N N 

EEfc^)(^l C )(^| (36) 

B=1C=1 

In the evaluation of Eq. 36, we can take advantage of the 
locality of basis sets and molecular orbitals by evaluat- 
ing a A AB interaction table and a A AB interaction table. 
A AB is set to if all the elements of the i)^ x matrix 
X A 4' B an d all the elements of the b B x ua matrix x B <P A 
have an absolute value lower than a given threshold, and 
is set to 1 otherwise. Similarly, A AB is set to if all the 
elements of the nA^n B matrix <$ A F (f> B have an absolute 
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value lower than a given threshold, and is set to 1 oth- 
erwise. Rearranging Eq. 36 and using these interaction 
tables, we can write 

N 

K\PFPX A = E A ab {x\i B ) (l B X A ) 

B=l 



B=2 



+ adjoint . 



(37) 



The use of the tesserae interaction tables A AB and A^g 
guarantees that the calculation of the present term does 
not scale as iV 2 because the number of tesserae in the 
neighborhood of (or interacting with) tessera A does not 
increase indefinitely with the size of the molecule. Also, 
note that the interaction tables do not need to be updated 
every macroiteration, because the localized orbitals do 
not experience big changes in size after a few iterations. 
It should be noticed that we assume that a linear-scaling 
method is used for the computation of the Hamiltonian 
matrix of the full system, 4,7 ' 41 of which diagonal and non- 
diagonal blocks are needed in Eqs. 35 and 37. 
The last term in Eq. 35 can be written as 

X A &*A(n)^ X A = 

E A ab (X A £ B ) A A{n) B (^ B X A ) , (38) 

B = l 
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FIG. 1: Diagram of the two basic parallel loops of a 
macroiteration. In the first loop, the embedded tessera ef- 
fective Hamiltonian matrices are computed and diagonalized 
(Eqs. 30, 31, ...). In the second loop, the oU_q A matrices 
of the localization method of choice, L, are computed out of 
the orbitals resulting from the first loop and the localization 
transformations (Eqs. 30, 31, ...) are performed. The high- 
lighted boxes would be the only to be entered in a embedded 
cluster calculation where tesserae #3 and #4 define the active 
cluster. 



where the A AB interaction table is used and A 



A(n)B 



is the 



n B x n B diagonal matrix resulting from the elements of 
\.A(n) corresponding to the localized orbitals of tessera B. 
For the particular choice of the arbitrary diagonal matrix 
Aa(«) leading to Eq. 23, Eq. 38 is further simplified: 



x} A ±KA{n)^ ' X A = 



A^A 



= A (E i m ^ fa i j xim 

where all X A i must be negative. 

The computation of the F A matrices (Eqs. 35, 37, and 
38) and their diagonalization (Eq. 30) can be performed 
in parallel. This is graphically indicated in the upper part 
of Fig. 1 . The global scaling of this diagonalization step is 

J2b=i ^(&b)i m tne case 01 an tesserae having the same 
local basis set size, b, and same number of interactions 
with other tesserae, the scaling is 0(b 3 N). 



are normally of order n 3 or higher. 24,25 In the present 
method, however, they are computed in each macroiter- 
ation out of other set of localized orbitals (Eq. 19). We 
can expect the contributions of the initial localized or- 
bitals ip^ to the target localized orbitals (p L to decay 
with distance. Accordingly, a A AB local rotation table 
can be computed, where A AB = 1 if the initial localized 
orbitals of tessera B are used to compute the target local- 
ized orbitals of tessera A (and viceversa) and A AB = 
otherwise. Several options to compute the A AB local ro- 
tation table are possible. Reasonable choices are to use 
the same criterium as for the A AB interaction table ex- 
cept for the use of a different threshold or, simply, make 
the A AB local rotation table identical to the A AB inter- 
action table. 

Arranging the initial localized orbitals in tesserae, 



(o) 



(40) 



C. Localization by local rotations 



Usually, a given set of localized orbitals of a molecule, 
ip L , is computed out of the canonical orbitals (Eq. 4). 
Well defined algorithms are routinely used to calculate 
the n x n unitary transformation matrix can U L , which 



the approximation of local rotations for the localization 
step can be written as 



tessera A : ^ = £ A^ ^ U% A , (41) 
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of the localization method of choice L (e.g. 3 in Pipck- 
Mezey and 5 in Edmiston-Ruedcnbcrg methods); if the 
same number of orbitals are used in the local rotations 
of all tesserae, t%lr, the scaling is 0(n LR N). 

Since the localization by local rotations among initial 
localized orbitals goes together with the use of orbital- 
specific basis sets, the resulting localized orbitals of 
tessera A must be represented with the basis set \ A alone. 
This can be achieved by truncation of ip L after the trans- 
formation (Eq. 41) or, alternatively, by its projection on 



the x A 



We have not ob- 



space, x A (x A X A/ 
served any practical advantage in projecting instead of 
truncating, neither in the precision attainable in the total 
energy nor in the convergence, in all the tests performed. 



FIG. 2: Schematic representation of the localization by local 
rotations (step 2 in Fig. 1). The unitary matrix oU_ L is re P~ 
resented together with the blocks that are computed in the 
localization parallel loop for tesserae #1 to #10. The block 
computed for tessera #8 {oU_lr#8 m Eq. 43) is highlighted. 
The subblock representing the actual localized orbitals of this 
tessera is represented with a rectangular fill box. 

N 

tessera B : = £ A|£ pg> U% B , (42) 

C=l 

In Eq. 41, oILca i s the n c x n A block of U_ L with the 
contributions of the initial localized orbitals of tessera C 
to the target localized orbitals of tessera A. The col- 
umn block made of all qU_ca with A A q = 1 is repre- 
sented by the fill rectangular box shown in Fig. 2. This 
column block can be calculated approximately as the col- 
umn block of the submatrix of qU_ l indicated in Fig. 2 by 
the open square box. We can call this submatrix qU£ ra , 
so that if (pf RA is a row vector with the initial local- 
ized orbitals of all tesserae C having = 1 (of length 
n L RA = Ec=i A ac n c), then 

oLLlra can be computed by simple application of the 
localization method of choice to the set of ti^ra initial 
localized orbitals p^ RA - Note that only the ha orbitals 
of P^ra corres P on ding to tessera A {<P A ) are taken from 
Eq. 43. Lowering the threshold of the local rotation table 
improves the precision of this approximation systemati- 

Ca Tlie computation of the oIZlaa matrices (Eqs. 43) can 
be performed in parallel. This is graphically indicated 
in the lower part of Fig. 1. The global scaling of this 
localization step is J2b=i ^( n Lfls)i where t is the order 



D. Symmetry 

The orbitals resulting from the present method do not 
belong to irreducible representations of the molecular 
point symmetry group because they are not eigenfunc- 
tions of the totally symmetric one-electron Hamiltonian 
F. However, except for symmetry breaking localization 
methods, they are related by the symmetry operations of 
the molecule and this fact can be used to reduce comput- 
ing time. 42 In effect, all tesserae orbitals can be obtained 
by applying molecular symmetry operations, R, to a list 
of symmetry independent tesserae, so that if A is a sym- 
metry independent tessera and tessera B is obtained from 
A by symmetry operation R, then ip L B — Rf A - In other 
words, the diagonalizations and localizations described 
in Sections II B and II C can be performed only on the 
list of symmetry independent tesserae. The computation 
°f E.A (E<F 35) can also profit from this property. 

Also, the site symmetry of the tesserae can be used in 
exactly the same way that molecular symmetry is used 
in standard molecular calculations, because the matrix 
of the effective Hamiltonian of any embedded tessera is 
blocked according to the irreducible representations of 
the local point symmetry group of that particular tessera. 

Besides the use of symmetry, one can also take advan- 
tage of quasisymmetry for speeding up purposes. So, if 
two tesserae A and B are quasisymmetry related by an 
operation R q (ip 1 ^ w Rqip 1 ^), they can be treated as sym- 
metry related for a number of macroiterations, in which 
the list of tesserae where the diagonalization and local- 
ization steps are performed can be shortened, finally re- 
leasing all quasisymmetry restrictions up to a full con- 
vergence of the Mosaico calculation. 



E. Summary of the Mosaico algorithm 

In summary, the algorithm for a Mosaico calculation 
of a molecule is the following: 

1. Take the current guess of localized orbitals of the 
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molecule and do, in parallel, for each symmetry 
independent tessera: 

(a) compute the embedded tessera effective 
Hamiltonian matrix (Eqs. 35, 37, and 38), 

(b) diagonalize it (Eq. 30) and compute new 
tessera orbitals (Eq. 33), 

2. Take all the orbitals produced in step 1 and do, in 
parallel, for each symmetry independent tessera: 

(a) compute the oLLlra unitary matrix (Eq. 43) 
corresponding to the localization method of 
choice L by applying the corresponding local- 
ization algorithm, and take the columns that 
correspond to the current tessera, 

(b) compute the target tessera localized orbitals 
(Eq. 41), 

3. Check for convergence and iterate on step 1 if nec- 
essary. Compute properties upon convergence. 

The parallel steps 1 and 2 are schematically repre- 
sented in Fig. 1. 

III. RESULTS 

In this section, we present the results of moni- 
toring calculations on poly(cthylcnc oxide) molecules, 
H(CH 2 OCH 2 ) m H, (Sec. Ill A) and three dimensional CO 
clusters, (CO) m , (Sec. IIIB) aimed at showing the con- 
vergence of the parallel calculations to the right solutions, 
the convergence of the total energies with the size of the 
orbital-specific basis sets towards the exact values, and 
the linear-scaling of the method. We also include em- 
bedded cluster calculations on defective systems result- 
ing from a chemical substitution of one O atom by a S 
atom in poly(ethylene oxide), (Sec. IIIC) aimed at show- 
ing the performance of embedded cluster calculations vs. 
full system calculations. 

All the calculations are single point energy calculations 
with an Extended Hiickel (EH) Hamiltonian. 36 This is a 
convenient choice to monitor the Mosaico procedure be- 
cause, on the one hand, the Hamiltonian and overlap 
matrices of the EH method are isomorphous with their 
ah initio counterparts, and on the other, the Hamiltonian 
is not self-consistent and its computation is straightfor- 
ward. In this way, the analysis of timing and scaling is fo- 
cused on the orbital optimization or diagonalization part 
and free from contaminations due to the computation of 
the Hamiltonian matrix. The calculations have been per- 
formed with a 2GB RAM personal computer with the 
program Mosaico; 43 the Extended Hiickel Hamiltonian 
and overlap matrices have been calculated with the pro- 
gram EHT. Although the loops performed mimic par- 
allel loops (Fig. 1), the total elapsed times shown here- 
after correspond to sequential loops performed in a single 
processor. In the present version of the program we have 



paid special attention to the scaling features, whereas the 
prefactors are highly improvable. This fact, together with 
the average performance of the personal computer used, 
makes the absolute values of the elapsed times shown in 
this section of little value; instead, it is the scaling of the 
method what is relevant. 



A. Poly(ethylene oxide) 

Poly(ethylene oxide) (PEO) 45 is a polymer widely used 
in the field of polymer electrolytes of molecular formula 
H(CH 2 OCH 2 ) m H. The molecular calculations we present 
for different numbers of monomers, to, use the local- 
ization method of projected localized molecular orbitals 
(PLMO) 46 (see Appendix) using a very simple set of ref- 
erence orbitals: sa+sb for all A-B pairs of bonded atoms, 
plus p y (0) + Pz(O) and p v (0) — p z {0) for all oxygen 
atoms, where y and z are local cartesian axis on the oxy- 
gen assuming the C-O-C atoms define a xy plane with 
the y axis bisecting the C-O-C angle. 

In all these calculation we used the following defini- 
tion for the tesserae or subsystems: One tessera made of 
10 orbitals localized in the sigma bonds and lone pairs 
of CH 3 OCH 2 - [which, in the localization method used 
for these particular calculations, are the 10 orbitals with 
one-to-one maximum overlap with the reference orbitals 

+ 8(d), S(d) + s(H 2 ), 8(d) + s(H 3 ), 8(d) + 

S (Or), py(Ox) + Pz (0 1 ), py(Ox) -Pz(Oi), *(Oi) + s(C 2 ), 
s(d) + s(H 4 ), s(d) + s(H 5 ), s(d) + s(d)}, plus to- 2 
tesserae made of 9 orbitals localized in the sigma bonds 
and lone pairs of the next CH 2 OCH 2 - groups, and a final 
tessera of 9 orbitals localized in the sigma bonds and lone 
pairs of the terminal CH 2 OCH 3 . 

We performed the Mosaico calculations using three dif- 
ferent orbital-specific basis sets: In the calculations la- 
beled IN, the orbitals of each tessera have been repre- 
sented with a subset of the global basis set consisting 
of all the basis set functions of the atoms involved in 
the bonds and lone pairs of the tessera plus those of the 
atoms involved in bonds and lone pairs of the first neigh- 
bor tesserae or monomers. In the 2N and 3N calculations, 
the orbital-specific basis sets were extended to second and 
third neighbor monomers. All the calculations converge 
to the same results regardless of the initial guess and the 
iteration procedure (parallel or any kind of sequential 
choice). The 9 localized orbitals which constitute one of 
the the bulk tesserae of H(CH 2 OCH 2 ) 30 H are shown in 
Fig. 3. 

Fig. 4 shows the total energy of H(CH 2 OCH 2 ) 30 H as 
a function of the orbital-specific basis sets used, which 
converges to the exact energy in the limit of a standard 
calculation where all orbitals are spanned in a common 
basis set. As the size of the OSBS increases, the lack of 
full orthogonality between orbitals of differente tesserae 
originated by the basis set truncation becomes negligible 
and, accordingly, the difference between the total energy 
properly computed and the total energy computed un- 
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A 



FIG. 3: The nine localized orbitals that constitute one of the 
bulk tesserae used in H(CH20CH2)3oH. 
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der the assumption that the orbitals are fully orthogonal 
vanishes. The energy loss per monomer due to the use 
of orbital-specific basis sets instead of a common basis 
set for all orbitals, is presented in Table I for several 
orbital-specific basis sets as a function of the molecular 
size. The predictability of the energy losses is apparent. 
Fig. 5 shows the convergence with macroiterations of the 
total energy of the H(CH 2 OCH 2 )3oH polymer at the 2N 
level of orbital-specific basis set. 

Fig. 6 shows the wall clock elapsed time per macroi- 
teration in the calculation of the H(CH20CH2) m H 
molecules as a function of the number of monomers m 
(which in this case coincides with the number of tesserae, 
N), of atoms, and of basis set functions. The times per 
macroiteration and tessera spent in the diagonalization 



TABLE I: Energy loss per monomer with respect to canon- 
ical calculations, in hartree/monomer, of poly(ethylen oxide) 
H(CH20CH2) m H molecules and (CO) m clusters, as calcu- 
lated with several orbital-specific basis sets. 
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FIG. 4: Total energy of H(CH 2 OCH2)3oH as a function of the 
orbital-specific basis set size. IN, 2N, and 3N labels indicate 
that each tessera is calculated with the basis set functions of 
all atoms up to first, second, and third neighbor monomers, 
respectively. Energy losses with respect to the exact canonical 
energy are indicated on the lines, in hartree units. Full line: 
correct calculation of the total energy. Dashed line: total en- 
ergy calculated under the assumption of perfect orthogonality 
between the localized orbitals. 
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FIG. 5: Convergence with macroiterations of the total en- 
ergy of H(CH20CH2)3oH with 2N orbital-specific basis set. 
Convergence to several subunits of hartree are indicated. A 
similar number of macroiterations for convergence has been 
found in all poly(ethylen oxide) polymers studied. 
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FIG. 6: Total wall clock elapsed times per macroiter- 
ation in the calculation of poly(ethylen oxide) molecule 
H(CH 2 OCH 2 ) m H (m = 10, 20, 50, 100, 200, 500, 1000, 2000). 



step and in the localization step are shown in Fig. 7. The 
horizontal lines reflect the O(N) scalings of both steps. 
The 0((b 3 )) scaling of the diagonalization step is clearly 
reflected in the separation between the IN, 2N, and 3N 
horizontal lines. The drop of the lines at low number of 
monomers is due to surface effects: The ratio between 
edge tesserae and bulk tesserae is significant in small 
polymers and, since edge tesserae are less demanding in 
terms of orbital-specific basis set and in terms of number 
of inter-tesserae interactions, they reduce the computing 
time with respect to what it would be if all of them were 
bulk tesserae. As we will see below, this fortunate surface 
effect, which lowers time with respect to a set of N bulk 
tesserae, is much more pronounced in 3D systems. The 
localization times shown in the bottom part of Fig. 7 are 
very similar in the IN, 2N and 3N calculations. This is 
so because they depend basically on the number of or- 
bitals used in the local rotations, which is the same in the 
three calculations. For the particular localization method 
we used for these calculation, PLMO, the elapsed times 
scale as ELi^Kab) » 0({n 3 LRp ))0(N). The small 
dependence with the size of the orbital-specific basis sets 
is related with the lengths of the matrix transformations 
in Eq. 47. 



B. (CO) m clusters 

We performed Mosaico calculations on three dimen- 
sional (CO) m clusters of several sizes, extracted from 



FIG. 7: Elapsed times per macroiteration and tessera in the 
calculation of poly(ethylen oxide) molecule H(CH20CH2) m H. 
Up: Diagonalization step. Down: Localization step. 



the crystalline structure of the a phase of solid carbon 
monoxide (P2i3 spatial group, CO bond length r(C- 
O) = 1.128 A, cell constant a = 5.64 A), 47 which is 
an interesting material known to experience irreversible 
photopolymcrization under pressure. 48 ' 49 In this crystal, 
a bulk CO molecule has a first-neigbhor coordination 
number 12 (CO molecules at a distance between centers 
of gravity of 3.99 A), and second- and third- neigbhor co- 
ordination numbers 18 and 42 (CO molecules at 5.64 A 
and 6.91 A, respectively). The (CO)@3 cluster is repre- 
sented in Fig. 8 as an example. 

In these calculations we also used the PLMO localiza- 
tion method, with the reference orbitals defined as the 
5to valence occupied canonical orbitals of the m isolated 
CO molecules. We defined each tessera to be made of 5 
orbitals localized in the spatial region of a CO molecule, 
which, for the chosen localization method, means the 5 
localized orbitals with maximum overlap with the canoni- 
cal orbitals of the CO molecule. The orbital-specific basis 
sets used have been labeled IN, 2N, and 3N when the ba- 
sis set of a tessera consists of the basis set functions of its 
C and O atoms plus the basis set functions of the atoms 
of the first, second, and third-neighbor CO molecules, 
respectively (see above). 

The energy losses per CO molecule (Table I) are small 
and diminish as the size of the OSBS increases. Times 
per macroiteration and tessera spent in the diagonal- 
ization step and in the localization step are shown in 
Fig. 9. The diagonalization times spent in a inner or 
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FIG. 8: (CO)63 cluster as a piece of the a phase of solid 
carbon monoxide (space group P2i3). 



bulk tessera, which is the most demaning, are included 
as a reference. This time is constant for all clusters where 
the inner tessera is a truly bulk tessera, that is, where its 
OSBS and its interaction tables die off before the limits 
of the cluster; this situation is reached in a smaller clus- 
ter for the IN OSBS and in a bigger cluster for the 3N 
OSBS. Naturally, the times spent per tessera in the full 
system calculations are lower than the times spent in the 
bulk tesserae, the differences showing the importance of 
surface effects: smaller clusters and bigger orbital-specific 
basis sets have a larger ratio of surface/bulk tesserae and, 
accordingly, a larger time reduction with respect to the 
bulk tesserae. In a large m regime, the bulk tesserae are 
dominant and set the assymptotic limit of the full sys- 
tem times, which scale linearly with the cluster size. The 
0((b 3 }) dependence is shown by the asymptotic values 
of the IN, 2N, and 3N lines, as well as by the values of 
(^ 3 ) = (Ei=i b\)/m printed along the 3N line in Fig. 9. 
The localization times (bottom of Fig. 9) are very much 
independent of the size of the OSBS because they are 
directly dependent in the number of occupied orbitals 
included in the local rotations, which is the same in IN, 
2N, and 3N calculations; the small dependence shown in 
the Figure is due to the fact that the transformations 
down to the basis set level depend on the OSBS size. 



C. Embedded cluster calculations 

The Mosaico method can be used for embedded cluster 
calculations, where the computational effort is focused on 
an active site of a molecule, comprising only a number 
of relevant tesserae, while the rest of it is taken from a 
previous calculation on a similar molecule and frozen. In 
this section we show the results of embedded cluster cal- 
culations on H(CH 2 OCH 2 ) p -CH 2 SCH 2 -(CH 2 OCH 2 ) p H. 

H(CH 2 OCH 2 )p-CH 2 SCH 2 -(CH 2 OCH 2 ) p H can be re- 
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FIG. 9: Elapsed times per macroiteration and tessera spent 
in the calculation of (CO) m clusters. Down: Localization 
step of the IN, 2N, and 3N OSBS calculations. Middle: Di- 
agonalization step of the IN and 2N OSBS calculations. Up: 
Diagonalization step of the 3N OSBS calculations; The values 
of (b 3 ) = Q3T = i b\)/m, where &a is the number of basis set 
functions in the OSBS of tessera A, are indicated. 



garded as the result of creating a chemical defect in 
H(CH 2 OCH 2 ) m H, with p = (m - l)/2, by substitution 
of the central oxygen by a sulfur atom. We may ex- 
pect the localized orbitals distant from the S atom in 
the "defective" molecule to be very similar to the or- 
bitals of the "perfect" molecule localized in the same spa- 
tial regions and, accordingly, we may take them from a 
previous calculation on H(CH 2 OCH 2 ) m H and use them 
in a Mosaico calculation of H(CH 2 OCH 2 ) p -CH 2 SCH 2 - 
(CH 2 OCH 2 ) p H where they are kept frozen; this defines 
an embedded cluster Mosaico calculation. Canonical and 
Mosaico calculations on the m = 21 polymer (using the 
same nuclear configurations before and after the creation 
of the S defect) reveal that the precision reached with a 
3N orbital-specific basis set on the perfect polymer re- 
quires a better OSBS after creating this chemical de- 
fect: A 5N basis set for the central defective tessera 
and its 5 neighbor tesserae together with a 3N basis 
set for the remaining tesserae, gives an energy error of 
4.5 x 10 -9 hartree/monomcr. Taking this into account, 
we performed embedded cluster calculations using a 5N 
OSBS for the variational tesserae and taking the orbitals 
that remain frozen for the rest of tesserae in the polymer 
from the 3N OSBS calculation on the "perfect" polymer 
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molecule H(CH20CH2) m H. The total energy errors of 
these embedded cluster calculations (with respect to the 
full molecular Mosaico calculations) on the polymers of 
21 and 201 monomers are shown in Fig. 10 as a function 
of the active cluster size (number of active tesserae). 

It is shown that the errors are too large to be accept- 
able if only the central defective tessera is active, whereas 
they drop to an acceptable value of 40 x 10~ 6 hartree 
when the orbitals of the first-neigbhor tesserae are opti- 
mized as well, and to less than 1 x 10~ 6 hartree when 
second-neigbhor tesserae are part of the variational em- 
bedded cluster. These errors are the same in the two 
to = 21 and m = 201 polymers, as corresponds to the 
local nature of the chemical defect. The times spent in 
the embedded cluster calculations, as a fraction of the 
times spent in the respective full system calculations, are 
shwon at the bottom of Fig. 10. Overall, this figure illus- 
trates the potcntiallity of embedded cluster calculations 
where the transferability of the localized orbitals of a lo- 
calization method of choice is exploited. 
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IV. CONCLUSIONS 



We presented a new linear-scaling method for the 
energy minimization step of semiempirical and first- 
principles Hartrcc-Fock and Kohn-Sham calculations, 
which we abbreviated under the name Mosaico. In this 
method, a set of embedded tessera pseudoeigenvalue cou- 
pled equations is solved in a building-block self-consistent 
fashion, which results in optimum occupied localized or- 
bitals of any localization method of choice, represented 
with orbital-specific basis sets. The Mosaico method is 
parallel at a high level of the calculation. It can be used 
in full system calculations as well as in embedded cluster 
calculations, where only an active fraction of the local- 
ized molecular orbitals of the whole system are varia- 
tional while the rest are taken from a similar molecule 
and kept frozen. 

We presented the results of monitoring, single point en- 
ergy calculations with the extended Hiickel Hamiltonian 
on poly(ethylene oxide) molecules and three dimensional 
carbon monoxide clusters with very large number of ba- 
sis set functions. Total energy losses due to the use of 
orbital-specific basis sets are small for reasonably small 
sizes of these and total energies converge to the canonical 
values when the orbital-specific basis sets are increased 
towards the limit of a common basis set for all the lo- 
calized orbitals. Convergence of total energy with self- 
consistent macroiterations is good and elapsed times per 
macroiterations have been shown to scale linearly with 
the molecular size. Besides the full system calculations, 
the good performance of the much less demanding em- 
bedded cluster approach has been shown in total energy 
calculations on defective systems resulting from chemi- 
cal substitution of an oxygen atom by a sulfur atom in 
poly(cthylene oxide) molecules. The transferability of the 
localized orbitals of a given localization method between 
similar molecules has been shown to lead to the same to- 
tal energy precision than full molecular calculations at a 
fraction of the computational cost. 
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FIG. 10: Embedded cluster calculations on H(CH 2 OCH 2 ) p - 
CH 2 SCH2-(CH 2 OCH2) p H (p = 10,50) with a 5N OSBS 
for the variational tessera and 3N OSBS for the remaining 
tesserae. The abscissa labels correspond to the number of 
variational tesserae included in the active clusters; the defec- 
tive tessera is always the central one. Above: Total energy 
errors, in /ihartree, with respect to full molecule Mosaico cal- 
culations using 5N OSBS in the 11 central tesserae and 3N 
OSBS in the rest; the result of the smallest cluster is indi- 
cated. Below: Elapsed time per macroiteration, as a fraction 
of the time per macroiteration of the full molecule Mosaico 
calculation. 
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APPENDIX 



being 



A. Projected localized molecular orbitals 

A simple localization method has been proposed 
by Ruedenberg et al., 46 which has not reached the 
popularity of other methods like Boys, 23 Edmiston- 
Ruedenberg, 24 and Pipek-Mezey 25 methods. Although it 
has been formulated in a context of atoms-in-molecules, 
it is a general method. Here, we reformulate it in the gen- 
eral case and in the case of localization by local rotations 
(Sec. IIC), using the present notation. 

Given a set of n input occupied (canonical or lo- 



calized) orthogonal orbitals of a molecule, ip 



(0) 



(0)\ 



9 



(«) 



n 



^PLMO 



: • 1 1 : a set of n reference 

arbitrary orbitals, £ = (I I 6), • • • I £n))> thc 
projected localized orthogonal orbitals, p 
(| pf LMO ), | <p% LMO ), . . . , | ^ LMO )), are defined as 
those resulting from a unitary transformation of tp(°) 
which are most similar to £. The PLMOs correspond 
to maximizing the one-to-one overlaps with the reference 
orbitals £, or the functional J27=i I ( ( ff LM ° I £») V under 
orthogonality constraints 50 and can be computed as 46 ' 50 



PLMO 



with 



(44) 



(45) 



In the approximation of local rotations (Section II C), the 
projected localized orbitals of a tessera A are computed 
out of Ulra (n-LRA > tia) input localized orbitals (p^RA 
and riLRA reference orbitals £, LRA , both of them including 
the localized/reference orbitals belonging to the tesserae 
included in the local rotations, with the expression: 



PL MO 



PLMO 



t L RA Q—LRA 



(46) 



ttPLMO 

oLLlra 



{( 



<p (0)t f 

^.LRA 2.LRA i 



V 0)t i 

K 2-LRA 2.LRA 



v> m e ) \ 

J—LRA 2.LRA J J, 



-col 



,(47) 



where it has been indicated that only the columns 
that correspond to tessera A are computed and used. 

Among thc advantages of thc PLMO localization 
method are its speed and its simplicity, because the usual 
iterative optimization procedures involved in localiza- 
tion 23 ' 24 ' 25 are substituted by a one-step calculation of 
the reciprocal square root of a symmetrical matrix, which 
is an 0(n 3 ) process (or 0(n 3 LRA ) in local rotations). Its 
main disadvantage is the requirement of an external, ar- 
bitrary set of reference orbitals, £. Although this is a 
limitation in calculations of reactivity, it is not a prac- 
tical problem in molecular structure calculations where 
the nature of the bonds is known in advance and finding 
good reference bond and lone pair orbitals is not diffi- 
cult. We may remark that the application of the PLMO 
method with a reference consisting of a given set of local- 
ized orbitals, e.g. Edmiston-Rucdenberg's, leads exactly 
to that set of orbitals. This property can be exploited in 
many ways and, in particular, in order to remove the ar- 
bitrariness inherent to the PLMO method. For instance, 
it can be used to produce reference orbitals for thc PLMO 
method (to be used in large molecules) out of Edmiston- 
Rucdenberg's or other non-arbitrary localized sets com- 
puted in selected sets of relatively small molecules. Also, 
a Mosaico calculation addressed to produce a given set 
of localized orbitals like Edmiston-Ruedenberg's can be 
safely performed using the chosen localization method in 
some macroiterations and the faster PLMO method, with 
the current ER orbitals as a reference set, in the rest of 
them. 
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